Skip to content

adding pandas.api.typing.aliases and docs #61735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Jun 29, 2025

This is my first proposal for adding the typing aliases that are "public" so that people do not import from pandas._typing.

@Dr-Irv Dr-Irv requested a review from rhshadrach June 29, 2025 03:38
@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jun 30, 2025
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to make these public, what is the process of making changes to them?

.. currentmodule:: pandas.api.atyping.aliases

The typing declarations in ``pandas/_typing.py`` are considered private, and used
by pandasdevelopers for type checking of the pandascode base. For users, it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
by pandasdevelopers for type checking of the pandascode base. For users, it is
by pandas developers for type checking of the pandas code base. For users, it is

This also occurs more times below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next commit

@@ -83,6 +83,7 @@ Other enhancements
- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`).
- Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`)
- Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`)
- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not advertising where they come from.

Suggested change
- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)
- Many type aliases are now exposed in the new submodule :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in next commit

Axes,
Axis,
ColspaceArgType,
CompressionOptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many type aliases here where it is not clear what method(s) they are appropriate for. E.g. it would be wrong to use this for DataFrame.to_parquet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to cover that in the docs, without getting too specific. I can make the docs more specific, although there are cases where the aliases are used in lots of methods, so the list can get quite long. E.g., for CompressionOptions, I said "Argument type for compression in many I/O output methods" .

Open to suggestions as to how to better document this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only resolution I see is to introduce more aliases, e.g. ParquetCompressionOptions and CsvCompressionOptions. This would be my preference, but I can understand if there is an aversion to this.

In any case, if we deem something to be not "sufficiently good" I think we should refrain from releasing something new. That is my take on some of the aliases here, but I won't block if I'm alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current aliases follow what's in the code. So in your example, right now the type for compression in to_parquet() is str | None, while for to_csv() it is CompressionOptions. If we improve the typing in the code, then we can improve it here by introducing new aliases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't those improvements be made prior to making them public?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 1, 2025

If we are to make these public, what is the process of making changes to them?

My suggestion would be that if someone adds an alias to pandas._typing.py that is used as an argument or return type of a documented pandas method, then they should update the pandas/api/typing/aliases.py file and doc/source/reference/aliases.rst . Should I add something to the contributors guide about that?

@rhshadrach
Copy link
Member

@Dr-Irv - my question is about how do we go about changing the definition of aliases that we have already made public, not about adding new aliases.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 1, 2025

@Dr-Irv - my question is about how do we go about changing the definition of aliases that we have already made public, not about adding new aliases.

We just edit pandas._typing.py and we don't have to make changes elsewhere. Am I still misunderstanding your question?

@rhshadrach
Copy link
Member

We just edit pandas._typing.py and we don't have to make changes elsewhere. Am I still misunderstanding your question?

And break user code without warning? Can we introduce such breakages in minor or patch releases? While most breakages I would expect to be of a type-checking nature and therefore an annoyance, type-hints can be enforced in runtime and changes in this regard can introduce runtime breakages as well.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 2, 2025

We just edit pandas._typing.py and we don't have to make changes elsewhere. Am I still misunderstanding your question?

And break user code without warning? Can we introduce such breakages in minor or patch releases? While most breakages I would expect to be of a type-checking nature and therefore an annoyance, type-hints can be enforced in runtime and changes in this regard can introduce runtime breakages as well.

I am pretty sure we can change the definition of an alias without breaking user code, unless people do introspection on those aliases, which is not a supported usage of aliases anyway. For example, let's say we implement a new sorting algorithm and change SortKind to include the new sorting method, user code won't break.

If we deleted or renamed an alias, then user code could potentially break. But at least my observation has been (by getting alerts to when anyone makes PRs that change pandas._typing.py) that we don't make such changes to pandas._typing.py (which would then propagate to pandas.api.typing.aliases).

The renaming issue probably exists for everything in pandas.api.typing - have we committed to those names as well?

@rhshadrach
Copy link
Member

For example, let's say we implement a new sorting algorithm... user code won't break.

Or remove or rename an existing sorting algorithm?

unless people do introspection on those aliases, which is not a supported usage of aliases anyway

I think you're saying we don't support the enforcement of pandas type-aliases at runtime (e.g. use with Pydantic), is that right? Is this documented?

But at least my observation has been... that we don't [delete or rename type aliases]

That's fine, but I'm -1 here until we have a plan that is documented about how we would do so if such a case were to come up. I'm very flexible on what that plan could be, but there needs to be a plan.

The renaming issue probably exists for everything in pandas.api.typing - have we committed to those names as well?

These are public classes and need to go through the usual deprecation cycle if we were to remove or rename.

by pandas developers for type checking of the pandas code base. For users, it is
highly recommended to use the ``pandas-stubs`` package that represents the officially
supported type declarations for users of pandas.
Note that the definitions and use cases of these aliases are subject to change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is implying that they are subject to change without any user notice. If that is the case, can this be made more explicit and put in a .. warning:: box. Perhaps something like

... are subject to change without notice in any major, minor, or patch release of pandas.

I would also be okay with only saying major or minor; it seems okay to me saying we can promise not to make changes in patch releases.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 2, 2025

For example, let's say we implement a new sorting algorithm... user code won't break.

Or remove or rename an existing sorting algorithm?

So if we were to change the runtime allowable string for a sorting algorithm, e.g., "quicksort" becomes "Quicksort" or we were to remove "heapsort" from SortKind, and someone was using either "quicksort" or "heapsort" in their code, the code would fail at runtime. But that is independent of the alias changing its definition. In fact, if we updated the alias to do the renaming and/or removal, the type checker would pick up the change. My point here is that if we change the definition of the alias, if a user is not using the alias, their runtime code would break. If they were using the alias, which presumably would be for type checking, the type checker would pick it up for them.

unless people do introspection on those aliases, which is not a supported usage of aliases anyway

I think you're saying we don't support the enforcement of pandas type-aliases at runtime (e.g. use with Pydantic), is that right? Is this documented?

The code is inconsistent. Sometimes we check that the arguments are of the right possible values, sometimes we don't. But it is not related to the aliases themselves. My sense is that we shouldn't document this at all. We say that the aliases are for type checking.

But at least my observation has been... that we don't [delete or rename type aliases]

That's fine, but I'm -1 here until we have a plan that is documented about how we would do so if such a case were to come up. I'm very flexible on what that plan could be, but there needs to be a plan.

I think we have to treat them like we do other code changes. Not sure where to document that.

The renaming issue probably exists for everything in pandas.api.typing - have we committed to those names as well?

These are public classes and need to go through the usual deprecation cycle if we were to remove or rename.

So we can do that if we decide to rename or delete an alias, right?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 2, 2025

Also worth mentioning that @simonjayhawkins suggested making this "experimental" in #55231 (comment) although I'm not sure that's the right word here. I think the warning you suggested cover this, and I have added that in the most recent commit.

@rhshadrach
Copy link
Member

I think we have to treat [changes to type aliases] like we do other code changes.

I do not think this is possible. To my knowledge we have no process to warn users of the upcoming change to a type alias. This is unlike other parts of the pandas code where we can emit deprecation warnings, put behaviors behind flags, and the like. Happy to be wrong here; to make this explicit could you detail how we'd go about adding or removing a case to ArrayLike?

My sense is that we shouldn't document this at all. We say that the aliases are for type checking.

A large part of the community is also enforcing type-hints at runtime, e.g. via Pydantic. It seems to me if we are going to make these public, we should not handcuff users by disallowing this kind of usage.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 3, 2025

I think we have to treat [changes to type aliases] like we do other code changes.

I do not think this is possible. To my knowledge we have no process to warn users of the upcoming change to a type alias. This is unlike other parts of the pandas code where we can emit deprecation warnings, put behaviors behind flags, and the like. Happy to be wrong here; to make this explicit could you detail how we'd go about adding or removing a case to ArrayLike?

I don't think we have to notify in this case. TypeAlias is only used for type checking. There is nothing about the definition that affects runtime behavior.

A large part of the community is also enforcing type-hints at runtime, e.g. via Pydantic. It seems to me if we are going to make these public, we should not handcuff users by disallowing this kind of usage.

Yes, but I don't think you can enforce TypeAlias type-hints at runtime. You can enforce it on classes and basic python types, but not aliases.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 4, 2025

Yes, but I don't think you can enforce TypeAlias type-hints at runtime. You can enforce it on classes and basic python types, but not aliases.

For example - you can't call isinstance() on a TypeAlias:

>>> from pandas._typing import ArrayLike
>>> ArrayLike
typing.Union[ForwardRef('ExtensionArray'), numpy.ndarray]
>>> import numpy as np
>>> arr=np.array([1,2,3])
>>> isinstance(arr, np.ndarray)
True
>>> isinstance(arr, ArrayLike)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Condadirs\envs\pandasstubs\lib\typing.py", line 1260, in __instancecheck__
    return self.__subclasscheck__(type(obj))
  File "C:\Condadirs\envs\pandasstubs\lib\typing.py", line 1264, in __subclasscheck__
    if issubclass(cls, arg):
TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union

So these only have value in type declarations.

@rhshadrach
Copy link
Member

from pydantic_settings import BaseSettings
from pandas._typing import ArrayLike

class Foo(BaseSettings):
    x: ArrayLike

Foo(x=np.ndarray([1, 2]))  # Succeeds
Foo(x=1)  # ValidationError

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 4, 2025

from pydantic_settings import BaseSettings
from pandas._typing import ArrayLike

class Foo(BaseSettings):
    x: ArrayLike

Foo(x=np.ndarray([1, 2]))  # Succeeds
Foo(x=1)  # ValidationError

I’m without laptop for 2 weeks and on a plane about to take off but I’m pretty sure the type checkers would also flag this as an error.

I wouldn’t expect people to use the aliases without type checking turned on. So the error above would be caught before runtime, I.e. by the type checkers. So if we assume people importing an alias would type check their code before executing it, then we should be fine.

I’m fine to put in the docs something that explains that if you think that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Export (a subset of?) pandas._typing for type checking
3 participants